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Abstract 

This  paper  examines  extending  a  database  of 
English  verbs,  grouped  into  syntactico-semantic 
classes,  with  WordNet  senses.  Probabilistic  as¬ 
sociations  between  0-grids  and  WordNet  verb 
frames,  SEMCOR  frequency  data,  and  disam¬ 
biguation  based  on  an  information-theoretic  no¬ 
tion  of  semantic  similarity  are  used.  Mapping 
successes  and  failures  are  illustrated  with  drop. 

1  Introduction 

We  are  interested  in  mapping  entries  in  a 
database  of  4069  Engbsh  verbs  automatically 
to  WordNet  senses  (Miller  and  Fellbaum,  1991), 
(FeUbaum,  1998)  in  order  to  integrate  these  lex¬ 
ical  resources  for  multibngual  applications  such 
as  machine  translation  and  cross-language  in¬ 
formation  retrieval.  For  example,  the  Engbsh 
verb  drop  has  many  potential  translations  in 
Spanish:  bajar,  caerse,  dejar,  caer,  derribar, 
disminuir,  echar,  hundir,  soltar,  etc.  Our 
database  specibes  a  set  of  interpretations  for  the 
verb  drop,  differentiated  by  the  context  in  which 
they  appear  in  the  source-language.  Integration 
of  these  two  lexical  resources  abows  us  to  asso¬ 
ciate  this  interpretation  with  a  set  of  WordNet 
senses;  these,  in  turn,  are  used  in  choosing  an 
appropriate  verb  in  the  target  language. 

Our  work  in  lexical  resource  integration 
parabels  the  building  of  multibngual  thesauri 
(Hudon,  2001),  the  mapping  of  dozens  of  medi¬ 
cal  vocabularies  to  MeSH  (2000)  within  the  Uni¬ 
fied  Medical  Language  System  (UMLS,  2001), 
(Bodenreider  and  Bean,  2001),  and  work  in  on¬ 
tology  integration  (Hovy,  In  press).  As  seman¬ 
tic  resources  (e.g.,  machine-readable  dictionar¬ 
ies,  thesauri,  ontologies)  begin  to  proliferate,  we 
hnd  that  their  underlying  classihcatory  struc¬ 
tures  differ,  making  the  establishment  of  equiv¬ 
alences  across  them  anything  but  trivial.  But  as 


we  are  able  to  create  such  mappings,  we  both  ex¬ 
tend  the  power  of  individual  resources  and  add 
to  the  larger  research  effort  to  generate  stan¬ 
dardized  semantic  resources,  e.g.,  EAGLES.^ 

On  the  one  hand,  the  verb  database  contains 
mostly  syntactic  information  about  its  entries, 
with  much  of  that  information  applying  at  the 
level  of  the  classes  used  within  the  database. 
WordNet,  on  the  other  hand,  is  a  significant 
source  for  information  about  semantic  relation¬ 
ships,  with  much  of  that  information  applying 
at  the  “synset”  level.  Thus,  by  mapping  entries 
in  the  database  to  their  corresponding  Word- 
Net  senses,  the  semantic  potential  of  the  verb 
database  is  extended  significantly.  At  the  same 
time,  the  fully  mapped  database  becomes  itself 
a  data  set  in  the  larger  effort  to  find  common¬ 
alities  across  lexical  resources. 

2  Nature  of  the  Resources 

While  it  is  commonly  agreed  in  theory  that 
words  may  have  multiple  senses,  there  is  often 
little  agreement  in  practice  how  many  senses  a 
given  word  has  or  whether  word  senses  should 
be  broadly  or  narrowly  dehned  (Palmer,  2000). 
Detailed  examination  of  the  treatment  of  spe- 
cihc  words  in  seemingly  comparable  dictionar¬ 
ies  reveals  that  words  are  divided  into  senses  in 
divergent  ways  (Fillmore  and  Atkins,  f992).  Es- 
tabbshing  equivalences  across  lexical  resources 
under  such  circumstances  is  seldom  a  matter  of 
generating  one-to-one  mappings,  fndeed,  map¬ 
pings  between  lexical  resources  are  not  neces¬ 
sarily  symmetrical;  for  example,  when  health 
experts  mapped  terms  in  various  terminologies 
to  the  UMLS  Metathesaurus  and  could  not  hnd 
an  exact  match,  they  opted  for  more  general 

^Information  abont  the  Expert  Advisory  Gronp  on 
Langnage  Engineering  Standards  (EAGLES)  is  available 
at  http:/ /www. ilc.pi.cnr.it/EAGLES/intro.html. 


terms  almost  ten  times  as  often  as  more  specific 
ones  (Bean,  2000). 

In  onr  lexical  resonrce  integration  task,  we 
have  songht  to  identify  sets  of  WordNet  senses 
that  best  correspond  to  entries  in  the  verb 
database  and  not  vice  versa.  To  nnderstand  the 
challenges  involved,  it  is  hrst  necessary  to  com¬ 
pare  the  characteristics  of  the  two  resonrces. 

2.1  Verb  Database 

Onr  database  is  a  classihcation  of  4069  Enghsh 
verbs,  based  initially  on  English  Verbs  Classes 
and  Alternations  (EVCA)  (Levin,  1993)  and  ex¬ 
tended  throngh  the  splitting  of  some  classes  into 
snbclasses  and  the  addition  of  new  classes.  The 
resnlting  491  classes  (e.g..  Roll  Verbs,  Gronp  1: 
drop,  glide,  roll,  swing)  are  referred  to  here  as 
Levin+  classes.  As  verbs  may  be  assigned  to 
mnltiple  Levin+  classes,  the  nnmber  of  entries 
in  the  database  is  rather  larger,  viz.,  9611. 

Following  the  model  of  (Dorr  and  Olsen, 
1997),  each  Levin+  class  is  associated  with  a 
thematic  grid  (henceforth  abbreviated  0-grid), 
which  snmmarizes  a  verb’s  syntactic  behav¬ 
ior  throngh  specifying  its  predicate  argnment 
strnctnre.  For  example,  the  Levin+  class  ‘Roll 
Verbs,  Gronp  1’  is  associated  with  the  0-grid 
[theme  goal]  ,  in  which  a  theme  and  a  goal  are 
nsed  (e.g..  The  ball  dropped  to  the  ground). 

As  (Levin,  1993)  convincingly  demonstrates, 
there  is  a  correlation  between  a  verb’s  syn¬ 
tactic  behavior  and  its  semantics.  Thns, 
while  the  inclnsion  of  a  single  verb  in  mnl¬ 
tiple  LevinT  classes  is  gronnded  in  syntactic 
behavior-specihcally  in  its  predicate  argnment 
strnctnre  (as  captnred  in  one  or  more  corre¬ 
sponding  0-grids)  and  in  permissible  diathe¬ 
sis  alternations-it  may  also  be  reasonably  snp- 
posed  that  the  mnltiple  entries  of  a  verb  in  the 
database  represent  different  senses  of  the  verb. 

2.2  WordNet 

WordNet  1.6  covers  10,319  verbs,  organized  into 
12,127  synsets,  representing  22,066  verb  senses. 
Most  of  the  verbs  in  onr  database  (4056  of  4069) 
are  also  in  WordNet;^  these  verbs  have  12,561 
senses  in  WordNet  and  belong  to  8147  synsets. 
The  ratio  of  verb  senses  to  verbs  is  3.10  for  verbs 
in  both  WordNet  and  in  the  verb  database;  the 

^As  we  are  mapping  from  entries  in  the  verb  clas¬ 
sification  to  WordNet  senses,  the  existence  of  verbs  in 
WordNet  but  not  in  our  database  are  of  no  signihcance. 


ratio  of  verb  senses  to  verbs  for  onr  database  is 
2.36.  This  indicates  that  WordNet  nses  more 
hne-grained  word  sense  distinctions  than  the 
verb  database.  Moreover  the  basis  on  which  the 
distinctions  are  made  differ:  syntactic  behavior 
in  the  case  of  the  verb  database,  semantic  rela¬ 
tionships  in  the  case  of  WordNet. 

In  contrast  to  the  syntactic  emphasis  of  the 
verb  database,  WordNet  gives  mostly  semantic 
information  in  its  entries.  For  example,  Word- 
Net  records  semantic  relations  of  several  types 
between  synsets.  Using  the  semantically  tagged 
Brown  corpns  hies  contained  in  the  SEMCOR 
package,  WordNet  also  indicates  how  freqnently 
the  varions  senses  of  a  word  are  nsed,  thns  yield¬ 
ing  the  prior  probability  of  a  specihc  sense  for 
any  occnrrence  of  a  word.  While  information 
abont  the  syntactic  behavior  of  words  has  not 
been  emphasized  in  WordNet,  increasingly  snch 
information  is  being  incorporated.  Glosses  indi¬ 
rectly  indicate  the  predicate  argnment  strnctnre 
of  verbs  in  a  synset;  example  sentences  and  verb 
frames  spell  ont  the  predicate  argnment  strnc¬ 
tnre  more  dehnitively.  To  some  extent  the  verb 
frames-a  set  of  35  generic  sentence  frames,  e.g.. 

Somebody _ s  somebody  something.  Something 

_ s-hll  the  same  role  as  0-grids.  However,  they 

are  only  partially  comparable  and  thns  cannot, 
on  their  own,  snpport  mapping  verb  database 
entries  to  WordNet  senses. 

ft  is  worth  noting  that,  althongh  the  two  re¬ 
sonrces  nnder  consideration  were  constrncted 
according  to  different  principles,  WordNet’s  re¬ 
lational  organization  captnres  some  of  the  same 
information  as  decompositional  theories  of  verb 
meaning,  snch  as  the  one  nnderlying  EVCA 
(Fellbanm,  1998).  Along  these  same  lines,  Dang 
et  al.  (1998)  discnss  a  rehnement  of  the  EVCA 
class  organization  and  its  potential  mapping  to 
WordNet  senses. 

3  Data  for  Mapping  between  the 
Verb  Database  and  WordNet 

Since  it  is  not  possible  to  map  directly  between 
verb  database  entries  and  WordNet  senses, 
we  nsed  1791  entries  that  had  been  mann- 
ally  tagged  with  WordNet  senses  as  training 
data  to  generate  probabihstic  associations  be¬ 
tween  data  from  the  two  resonrces.  For  ex¬ 
ample,  one  of  onr  measnres  captnred  the  as¬ 
sociation  between  0-grids  and  WordNet  verb 


frames,  from  the  perspective  of  both  individual 
0-roles/verb  frames  and  overall  0-grids/sets  of 
verb  frames.  This  will  be  referred  to  as  a  syn¬ 
tactic  similarity  measure.  We  also  used  a  dis¬ 
ambiguation  algorithm  (Resnik,  1999a)-based 
on  an  information-theoretic  notion  of  semantic 
similarity  (Resnik,  1999b)-which  computes  the 
conhdence  that  specihc  WordNet  senses  hold, 
given  the  accompanying  set  of  verbs  in  the  same 
(Levin-|-)  class.  This  will  be  referred  to  as  a  se¬ 
mantic  similarity  measure.  We  also  used  SEM- 
COR  frequency  data  to  establish  the  prior  prob¬ 
ability  of  specihc  WordNet  senses. 

Based  on  a  handful  of  probabilistic  associa¬ 
tions  between  syntactic  and  semantic  charac¬ 
teristics  of  the  two  resources,  including  the  syn¬ 
tactic  similarity  measure  set  out  above,  as  well 
as  the  information-theoretic  semantic  similarity 
measure,  and  SEMCOR  frequency  data,  we  in¬ 
vestigated  a  number  of  voting  schemes  for  map¬ 
ping  entries  in  the  verb  database  to  WordNet 
senses.  The  best  results  achieved  72%  preci¬ 
sion  and  58%  recall,  versus  a  lower  bound  of 
62%  precision  and  38%  recall  for  most  frequent 
WordNet  sense,  and  an  upper  bound  of  87%  pre¬ 
cision  and  75%  recall  for  human  judgment.  Fur¬ 
ther  details  of  the  mapping  and  its  evaluation 
are  available  in  (Green  et  ah,  2001). 

4  Case  Study:  Drop 

In  this  section  we  consider  mapping  the  verb 
database  entries  for  drop  to  their  correspond¬ 
ing  WordNet  senses;  the  examples  are  taken 
from  the  ‘best  results’  voting  scheme,  with  two 
aggregate  voters,  one  based  on  the  product  of 
the  half  dozen  measures  indicated  above,  the 
other  based  on  their  weighted  sum.  The  dis¬ 
cussion  will  focus  on  the  0-grid/WordNet  verb 
frame  syntactic  similarity  measure,  the  Resnik 
semantic  similarity  measure,  and  SEMCOR  fre¬ 
quency  data  as  the  most  salient  of  those  mea¬ 
sures.  The  contribution  made  by  the  syntactic 
similarity  measure  to  the  mapping  process  re¬ 
flects  the  degree  to  which  the  0-grid  data  in  the 
verb  database  and  WordNet ’s  verb  frames  cap¬ 
ture  the  same  syntactic  behavior.  The  contri¬ 
bution  made  by  the  semantic  similarity  measure 
reflects  the  degree  of  compatibihty  between  the 
semantics  of  the  EVCA-based  verb  classes  and 
WordNet ’s  hierarchical  structure. 

There  are  8  entries  for  drop  in  the  verb 


database,  outlined  in  Table  1;  there  are  19 
senses  of  drop  in  WordNet,  outlined  in  Table  2. 
We  will  examine  4  cases:  (1)  an  appropriate 
WordNet  sense  correctly  mapped;  (2)  an  inap¬ 
propriate  WordNet  sense  correctly  not  mapped; 
(3)  an  appropriate  WordNet  sense  incorrectly 
not  mapped;  and  (4)  an  inappropriate  WordNet 
sense  incorrectly  mapped. 

The  hrst  case  involves  a  WordNet  sense  (sense 
3;  “stock  prices  dropped”)  that  our  mapping 
process  correctly  indicates  is  an  appropriate 
choice  for  a  verb  database  entry  (sense  3;  “the 
prices  dropped”).  The  sample  sentences  clearly 
indicate  an  exact  match  between  the  WordNet 
sense  and  the  verb  database  entry.  The  Word- 
Net  sense  is  the  third  most  frequently  occurring 
sense  of  drop  in  SEMCOR,  representing  almost 
12%  of  its  uses.  Thus  prior  probability  does  not 
promote  this  sense  very  strongly.  However,  both 
the  syntactic  and  semantic  similarity  measures 
identihed  this  as  the  most  likely  sense.  The 
association  between  the  verb  database  entry’s 
0-grid  [theme]  and  the  WordNet  verb  frame 

Something _ s  is  particularly  strong;  the  fact 

that  there  is  only  one  component  in  the  0-grid 
and  only  one  verb  frame  for  the  WordNet  sense 
helps  strengthen  that  association.  Likewise,  the 
presence  of  verbs  such  as  appreciate,  fluctuate, 
grow,  mushroom  and  vary  in  the  same  Levin+ 
class  strongly  point  the  semantic  similarity  mea¬ 
sure  to  a  WordNet  sense  in  the  change  domain, 
where  WordNet  sense  3  occurs.  The  strength  of 
the  evidence  with  regard  to  both  syntactic  and 
semantic  similarity  easily  overcome  the  weak¬ 
ness  of  the  prior  probability  measure. 

The  second  case  involves  a  WordNet  sense 
(sense  1;  “don’t  drop  the  dishes”)  that  our  map¬ 
ping  process  correctly  indicates  is  an  inappro¬ 
priate  choice  for  a  verb  database  entry  (sense 
3  again;  “the  prices  dropped”).  (Surprisingly, 
both  human  coders  rated  WordNet  sense  1  a 
good  choice,  despite  the  literal,  transitive  use 
of  the  WordNet  sense  versus  the  hgurative,  in¬ 
transitive  use  of  the  verb  database  entry!)  Over 
one-third  of  all  occurrences  of  drop  in  SEMCOR 
represent  WordNet  sense  1;  the  mapping  pro¬ 
cess  will  always  consider  this  the  most  appro¬ 
priate  sense  on  the  basis  of  prior  probabihty 
alone.  However,  the  semantic  similarity  mea¬ 
sure  for  this  sense  rated  this  motion  sense  of 
drop  at  a  zero  level  of  conhdence,  which  pretty 


# 

Levin+  class 

Example  sentence 

Required 

6  roles 

Optional 

6  roles 

1 

Drop 

She  dropped  the  book 
to  the  gronnd. 

agent 

theme 

goal 

2 

Pntting  down 

1  dropped  the  stone 
down  to  the  gronnd. 

agent 

theme 

mod-loc  (down) 

sonrce 

goal 

3 

Calibratable  changes 
of  state 

The  prices  dropped. 

theme 

4 

Meander  (to/from) 

The  river  drops 

from  the  lake  to  the  sea. 

theme 

sonrce  (from) 
goal (to) 

5 

Meander  (path) 

The  river  drops 
throngh  the  valley. 

theme 

goal 

6 

Roll  1 

The  ball  dropped. 

theme 

7 

Roll  2 

The  ball  dropped 
into  the  room 

theme 

source 

goal 

8 

Roll  down 

The  stone  dropped 
down  into  the  gronnd. 

theme 

particle  (down) 

source 

goal 

Table  1:  Senses  of  drop  in  Verb  Database 


ninch  scotches  the  possibility  of  its  being  as¬ 
signed.  The  syntactic  similarity  measnre  looked 
favorably  on  this  sense  from  the  perspective 
of  correlation  between  individnal  components 
of  the  0-grid  [agent  theme]  and  the  Word- 
Net  verb  frame,  since  the  verb  frame  Some¬ 
body  _ s  something  has  a  fairly  strong  asso¬ 

ciation  with  the  presence  of  a  theme,  bnt  the 
verb  frame  combination  (also  inclnding  Some¬ 
body  _ s  somebody)  has  only  a  weak  associa¬ 

tion  with  the  overall  0-grid. 

Having  looked  at  two  snccesses,  we  tnrn  now 
to  two  failnres.  The  third  case  involves  a  Word- 
Net  sense  (sense  f;  “don’t  drop  the  dishes”)  that 
shonld  have  been  assigned  to  a  verb  database 
entry  (sense  f;  “she  dropped  the  book  to  the 
gronnd”),  bnt  was  not.  As  noted  above,  this 
WordNet  entry  is  the  most  freqnently  occnrring 
sense  of  drop  in  SEMCOR  and  thns  is  favored 
by  the  prior  probabihty  measnre.  The  train¬ 
ing  data  inclnded  no  instances  of  the  0-grid  for 
this  LevinT  class  with  the  set  of  verb  frames 
for  this  WordNet  sense,  althongh  the  strength 
of  association  between  individnal  components  of 
the  0-grid  and  individnal  WordNet  verb  frames 
was  fairly  strong.  Uncharacteristically,  the  se¬ 
mantic  similarity  valne  for  this  WordNet  sense 


is  qnite  low.  The  reason  for  this  tnrns  ont  to 
be  that  drop  is  the  only  verb  in  this  Levin+ 
class.  Thns  the  semantic  similarity  measnre  has 
no  evidence  for  distingnishing  among  WordNet 
senses  and  assigns  them  all  an  eqnal,  bnt  in- 
signihcant,  conhdence  level.  In  this  case,  data 
sparsity  stands  in  the  way  of  correct  sense  as¬ 
signment.  ft  is  worth  noting,  however,  that  the 
available  evidence  promotes  the  correct  sense. 

The  hnal  case  involves  a  WordNet  sense 
(sense  6;  “drop  a  hint”)  assigned  to  a  verb 
database  entry  (sense  f;  “she  dropped  the  book 
to  the  gronnd)  that  shonld  not  have  been  as¬ 
signed.  Since  we  are  looking  at  the  same  verb 
database  entry  as  in  the  previons  example,  it 
will  be  instrnctive  to  contrast  the  two  Word- 
Net  senses.  As  WordNet  senses  are  listed  in  or¬ 
der  of  SEMCOR  freqnency,  sense  6  occnrs  rather 
less  often  than  sense  f.  As  explained  before, 
the  semantic  similarity  measnre  is  nnable  to 
distingnish  between  WordNet  senses  when  the 
LevinT  class  has  only  one  member,  as  in  this 
case.  What  drives  the  different  assignment  here 
is  the  absence  of  a  verb  frame  from  sense  6: 

Sense  f  allows  both  (Somebody _ s  something) 

and  (Somebody _ s  somebody),  while  sense  6 

allows  only  (Somebody _ s  something).  The 


# 

WordNet  gloss 

Verb  frames 

SEMCOR 

count 

1 

let  fall  to  the  gronnd;  “don’t  drop  the  dishes” 

Somebody s  something 

Somebody s  somebody 

36 

2 

fall  vertically;  “the  bombs  are  dropping 
on  enemy  targets” 

Something _ s 

Somebody _ s 

21 

3 

go  down  in  valne;  “stock  prices  dropped” 

Something _ s 

12 

4 

fall  or  drop  to  a  lower  place  or  level; 

“he  sank  to  his  knees” 

Something _ s 

Somebody _ s 

7 

5 

terminate  an  association  with; 

“drop  him  from  the  Repnbhcan  ticket” 

Somebody s  somebody 

Something _ s  somebody 

6 

6 

ntter  casnally;  “drop  a  hint” 

Somebody _ s  something 

6 

7 

stop  pnrsning  or  acting;  “drop  a  lawsnit” 

Somebody _ s  something 

5 

8 

leave  or  nnload,  esp.  of  passengers  or  cargo 

Somebody _ s  something 

Somebody _ s  somebody 

Somebody _ s  somebody  PP 

Somebody _ s  something  PP 

3 

9 

as  of  trees  or  people 

Somebody _ s  something 

Somebody _ s  somebody 

Something _ s  something 

2 

10 

of  games,  in  sports; 

“the  Giants  dropped  all  11  of  their  hrst  13” 

Somebody _ s  something 

2 

If 

pay  ont;  “spend  money” 

Somebody _ s  something 

Somebody _ s  something  on 

somebody 

1 

12 

lower  the  pitch  of  (mnsical  notes) 

Somebody _ s  something 

1 

13 

hang  freely;  “the  light  dropped  for  the  ceiling” 

Something _ s 

Something  is _ ing  PP 

0 

14 

stop  associating  with;  “they  dropped  her  after 
she  had  a  child  ont  of  wedlock” 

Somebody _ s  somebody 

0 

15 

get  rid  of;  “he  shed  his  image  as  a  pnshy  boss” 

Somebody _ s  something 

Something _ s  something 

0 

16 

leave  nndone  or  leave  ont; 

“how  conld  1  miss  that  typo?” 

Somebody _ s  something 

Somebody _ s  somebody 

Somebody _ s  to 

INFINITIVE 

0 

17 

change  from  one  level  to  another; 

“she  dropped  into  Army  jargon” 

Something  is _ ing  PP 

Somebody _ s  PP 

0 

18 

grow  worse;  “her  condition  deteriorated” 

Something _ s 

Somebody _ s 

0 

19 

give  birth,  nsed  for  animals; 

“the  cow  dropped  her  calf  this  morning” 

Something _ s  something 

0 

Table  2:  Senses  of  drop  in  WordNet 


association  of  (Somebody _ s  something)  with 

each  of  the  components  of  the  [agent  theme 
goal]  0-grid  is  mnch  stronger  in  the  training 

data  than  is  trne  for  (Somebody _ s  somebody)', 

moreover,  the  single  verb  frame  for  sense  6  has 


a  mnch  stronger  association  with  the  whole  6- 
grid  than  does  the  verb  frame  pair  for  sense  1. 
Data  sparseness  is  again  a  problem,  as  is  the 
difference  between  the  classihcation  of  syntac¬ 
tic  patterns  in  the  two  resonrces. 


5  Conclusion 

Semantic  data  in  WordNet-SEMCOR  frequency 
data  and  the  hierarchical  structure  of  WordNet- 
combine  with  associations  between  0-grid  infor¬ 
mation  and  WordNet  verb  frames  to  extend  a 
verb  classihcation  based  on  syntactico-semantic 
classes  with  WordNet  senses.  Data  sparseness  is 
a  major  factor  in  at  least  some  mapping  failures. 
At  the  same  time,  syntax-based  measures  con¬ 
tribute  less  to  mapping  successes  than  do  the 
semantic  similarity  and  word  sense  frequency 
measures.  This  suggests  a  larger  degree  of  com¬ 
patibility  between  the  semantics  of  Levin-f  verb 
classes  and  the  WordNet  relational  structure 
than  between  the  systems  used  in  the  two  re¬ 
sources  to  reflect  verbs’  syntactic  behavior. 
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